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Abstract 

It is known that if a Biichi context-free language (BCFL) consists of scattered 
words, then there is an integer n, depending only on the language, such that the 
Hausdorff rank of each word in the language is bounded by n. Every BCFL is a 
Miiller context-free language (MCFL). In the first part of the paper, we prove that 
an MCFL of scattered words is a BCFL iff the rank of every word in the language 
is bounded by an integer depending only on the language. 

Then we establish operational characterizations of the BCFLs of well-ordered 
and scattered words. We prove that a language is a BCFL consisting of well-ordered 
words iff it can be generated from the singleton languages containing the letters of 
the alphabet by substitution into ordinary context-free languages and the w-power 
operation. We also establish a corresponding result for BCFLs of scattered words 
and define expressions denoting BCFLs of well-ordered and scattered words. In the 
final part of the paper we give some applications. 



1 Introduction 



A word over an alphabet A is an isomorphism type of a labeled linear order. In this 
paper, in addition to finite and w-words, we also consider words whose underlying linear 
order is any countable linear ordering, cf. [Sjj . Countable words and in particular regular 
words were first investigated in [12], where they were called "arrangements". Regular 
words were later studied in [U [6l [3 [281 ES] and more recently in [31]. Context-free 
words were introduced in [8] and their underlying linear orderings were investigated in 
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co-financed by the European Regional Fund, and by the grant no. K 75249 from the National Foundation 
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Finite automata on w-words have by now a vast literature, see [33] for a comprehensive 
treatment. Also, finite automata acting on well-ordered words longer than oj have been 
investigated by many authors, a small sampling is [U [131 [m [371 [38] . the last decade, 
the theory of automata on well-ordered words has been extended to automata on all 
countable words, including scattered and dense words. In [21 [3l [l2], both operational 
and logical characterizations of the class of languages of countable words recognized by 
finite automata were obtained. 

Context free grammars generating w-words were introduced in [15] and subsequently 
studied in [11^ I32j. Context-free grammars generating arbitrary countable words were 
defined in [221123]. Actually, two types of grammars were defined, context-free grammars 
with Biichi acceptance condition (BCFG), and context-free grammars with Miiller accep- 
tance condition (MCFG). These grammars generate the Biichi and the Miiller context- 
free languages of countable words, abbreviated as BCFLs and MCFLs. It is clear from 
the definitions in |221 [23] that every BCFL is an MCFL. On the other hand, there exist 
MCFLs of even well-ordered words that are not BCFLs, for example the set of all count- 
able well-ordered words over some alphabet. This is due to the fact that the order-type 
of every word in a BCFL of well-ordered words is bounded by the ordinal w", for some 
integer n depending on the language, cf. [22]. More generally, it was shown in [22] that 
for every BCFL L of scattered words there is an integer n such that the Hausdorff rank 
of every word in L is bounded by n. On the other hand, regarding MCFLs L of scattered 
words, two cases arise, cf. [23] . Either there exists an integer n such that the rank of 
every word in L is bounded by n, or for every countable ordinal a there is a word in 
L whose Hausdorff rank exceeds a. It is then natural to ask whether every MCFL of 
scattered words of the first type is a BCFL. In this paper, we answer this question: all 
such MCFLs are in fact BCFLs. Thus, the BCFLs of scattered words are exactly the 
"bounded" MCFLs of scattered words. 

Then we establish operational characterizations of the BCFLs of well-ordered and scat- 
tered words. We prove that a language is a BCFL consisting of well-ordered words iff it 
can be generated from the singleton languages containing the letters of the alphabet by 
substitution into ordinary context-free languages and the w-power operation. We also 
establish a corresponding result for BCFLs of scattered words and define expressions 
denoting BCFLs of well-ordered and scattered words. In the final part of the paper, we 
give some applications of the main results. 

2 Basic notions 
2.1 Linear order ings 

A linear ordering (/, <) consists of a set / and a strict linear order relation < on /. 
When the set / is finite or countable, we call (/, <) finite or countable as well. In the 
rest of the paper, by a linear ordering we will always mean a countable ordering. A good 
reference for linear orderings is [34) . 
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A morphism of linear orderings (I, <) (J, <') is a function h : I ^ J that preserves 
the order relation, so that for all x,y & I, if x < y then h{x) <' h[y). Since every 
morphism is an injective function, we sometimes call a morphism an order embedding, 
or just an embedding. If / C J and the inclusion / ^ J is an order embedding, then we 
say that / is sub-ordering of J. When (/, <) is a subordering of (J, <'), the relation < is 
the restriction of <' onto /. An isomorphism is a bijective morphism. Isomorphic linear 
orderings have the same order type. The order type of a well-ordering is a (countable) 
ordinal. We identify the finite ordinals with the nonnegative integers. 

Some examples of linear orderings are the usual orderings of the nonnegative integers 
(N+, <), the ordering of the negative integers (N_, <), and the ordering (Q, <) of the 
rationals. Their respective order types are denoted lv, —uj and r]. 

Let (/, <) be a linear ordering. We say that (/, <) is a well-ordering if each nonempty 
subset of / has a least element. This condition is equivalent to requiring that (/, <) has 
no sub-ordering of order type —lo. Moreover, we say that (/, <) is dense if it has at least 
two elements and for all x, y € / with x < y there is some 2; G / with x < z < y. Finally, 
we say that (/, <) is scattered if it has no dense sub-ordering, and quasi-dense if it is not 
scattered. It is well-known that every sub-ordering of a well-ordering is well-ordered, and 
every sub-ordering of a scattered ordering is scattered. Moreover, up to isomorphism 
there are four (countable) dense linear orderings, the ordering of the rationals possibly 
endowed with a least or a greatest element, or both. The respective order types of these 
linear orderings are r/, 1 + r/, + 1 and 1 + 7/ + 1. (See below for the sum operation on 
order types.) 

When (/i, <i) and {I2, <2) are linear orderings, their sum (/i, <i) + {h, <2) is the linear 
ordering (/,<), where I = {Ii x {!}) U {I2 x {2}), moreover, for all {x,i),{y,j) G /, 
(x,i) < (y,j) iff z = 1 and j = 2, or i = j and x <i y. The sum operation may be 
generalized. Suppose that (J, <) is a linear ordering, and for each j £ J, (/j, <j) is a 
linear ordering. Then the generalized sum <j) is the disjoint union 

= {(x,i) :jG J, xelj} 

equipped with the order relation {x,j) < {y,k) iff j < k, or j = k and x <j y. We 
call a generalized sum a well-ordered, a scattered, or a dense sum, when ( J, <) has the 
appropriate property. It is known that every well-ordered sum of well-orderings is a well- 
ordering, and similarly, every scattered sum of scattered orderings is scattered and every 
dense sum of dense orderings is dense. When each (Ij, <j) is the linear ordering (/, <'), 
then the generalized sum Yljeji^j^ <i) called the product of (/, <') and (J, <), denoted 
(/, <') X (J, <). When (/, <) and (J, <) are both well-ordered, scattered or dense, then 
so is their sum or product. Since the above operations preserve isomorphism, they can 
be extended to order types. 

Hausdorff classified scattered linear orderings into an infinite hierarchy. Following [30j, 
we present a variant of this hierarchy. Let VDq be the collection of all finite linear 
orderings, and for a countable ordinal a > 0, let VD^ be the collection of all finite sums 



3 



of linear orderings of the sort 



^ (/„,<„) or ^ (/„,<„), 

neN+ neN_ 

where each <„) is in VDp^ for some /3„ < a. By Hausdorff's theorem [34], a hnear 
ordering (/, <) is scattered iff it belongs to VDa for some (countable) ordinal a. The 
least such ordinal is called the rank of /, denoted r(/). Hausdorff also proved that every 
linear ordering is either scattered, or a dense sum of scattered linear orderings. 

A useful fact is that a well-ordering has rank a iff its order type 7 satisfies lj" < 7 < u"''^'^, 
so that its Cantor normal form is 

X no + 1^"^ X ni + . . . + 00°''' x n^, 

where fc > 0, a > ai > . . . > and no, ■ ■ ■ ,nk are positive integers. 



2.2 Words and languages 

A word (or arrangement [16] ) u over a possibly infinite alphabet A is a linear ordering 
/ = (/, <) labeled in A. Thus a word u is of the form (/, <, A), where X : I ^ A. A 
morphism between words preserves the order relation and the labeling. An isomorphism 
is a bijective morphism. We usually identify isomorphic words. The order type of a word 
is the order type of its underlying linear order. 

Examples of words include the finite words whose underlying linear order is finite, in- 
cluding the empty word e whose order type is 0, the one-letter words a'^ and a""^, labeled 
a, whose underlying linear orders are the orderings of the nonegative and the negative 
integers, and the one-letter word whose underlying linear order is the ordering of the 
rationals. 

We call a word well-ordered, scattered, dense, or quasi-dense if its underlying linear 
order has the appropriate property. The rank r(n) of a scattered word u is the rank of 
its underlying linear ordering. For example, is well-ordered, is scattered but not 
well-ordered, and is dense. Also, r{a~'^) = r{a^) = 1. More generally, when a is a 
countable ordinal, a" is the word whose underlying linear order is a well-order of order 
type a, with each point labeled a. The word a^a^ obtained by "concatenating" and 
is quasi-dense, but not dense. (A formal definition of concatenation is given below). 

Let A'^ denote the set of all words over A. As usual, we denote by A* and A'^ the sets of 
all finite and all w- words over A, whose order type is uj. We define A-'^ = A* U Al' . 

A language over A is any subset of AK In particular, every subset of A* is a language (of 
finite words). Languages over A are equipped with the usual set theoretic operations. 
We now define the operation of substitution. 

Suppose that L A^, and for each a E A, La ^ BK Then the language 

L[a ^ LalaeA, or simply L[a ^ La] 
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is the language over B consisting of all words obtained from the words in .L by replacing 
each occurrence of a letter a G A in u by a word v ^ La- Different occurrences of the same 
latter may be replaced by different words. Formally, suppose that u = {I,<,X) € L, 
and for each i E I, let Vi = {li, <i,Xi) be a word in Lx(^i)- Then we construct the word 
u' whose underlying linear order is the ordered sum I = Ylieii-^i^ ^i) which is equipped 
with the labeling function 

\\{x,i)) = \i{x) 

for all i G / and x G /j. The language L[a i-^ La]aeA consists of all such words u'. If L 
and the La contain only well-ordered, scattered or only dense words, then the same holds 
for L[a I— >■ LalaeA- Below we will often follow the convention of writing L[a i— )■ La]aeAo 
where a ranges over a subset Aq of A to denote the substitution where each letter a £ Aq 
is replaced by La and each letter not in Aq remains unchanged, i.e., is replaced by {a}. 

When L and each La consists of a single word, say L = {u} and La = {va}, then 
L[a La] is also a singleton, and we denote its single element hy u[a i-^ Va]- If u and 
each Va is well-ordered (resp. scattered, dense), then so is u[a Va]- 

Using the generic operation of substitution, we now define further operations on lan- 
guages. Let xo,xi,. . . be letters. Suppose that L, Li, L2 C A". Then we define 

L1L2 = {xiX2}[Xi Li] 

L'^ = {xqXi . . .}[xi I-)- L} = {uqUi . . . : Ui E L} 
L~'^ = {. . . xiXo}[xi I-)- L] = {. . . uiUq : Ui E L} 

When L = {u}, Li = {ui} and L2 = {U2} are singleton languages, we obtain the word 
operations of concatenation U1U2 and the unary w-power and (— a;)-power operations 
v!^ = uu. . . and = . . . uu. 

2.3 Context-free languages 

When G = {N, A, R, S) is an ordinary context-free grammar (CFG) , where N is the set 
of nonterminals, A is the finite alphabet of terminals, R is the set of rules and S E N 
is the start symbol, we may consider possibly infinite derivation trees over G. Such a 
tree is a finitely branching rooted, ordered directed tree labeled in NUAU {e} such that 
whenever a vertex x is labeled X E N and has n successors, ordered as xi,. . . ,Xn and 
labeled Xi, . . . , X„ G A'" U ^, then X ^ Xi . . . Xn G R. When n = 1, it is also allowed 
that xi is labeled e and then X e E R. The label of the root is called the root symbol. 
A vertex with no successors is called a leaf. In particular, every vertex labeled in {e} 
is a leaf. The leaves of a derivation tree t form a linearly ordered set with respect to the 
usual left-to-right ordering, and considering only the leaves labeled in A^U^, we obtain a 
word in {NU A)K This word is called the frontier of t. When the root symbol of a finite 
derivation tree is X G U vl and its frontier is p G {N U A)* , then we write X p. 
As usual, we extend to a binary relation over {N Li A)*. The context-free language 
(CFL) generated by G is L{G) = {u E A* : S ^* u}. 

Suppose that A is a finite alphabet. A Buchi context-free grammar (BCFG) over ^ is a 
CFG {N, A, R, S) equipped with a designated subset N^o of the nonterminals N. When 
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G = {N, A, R, S, Noo) is a BCFG, call a derivation tree proper if along each infinite path 
(originating in the root) there are infinitely many vertices labeled in N^o- When the root 
of such a tree is labeled X and its frontier is the word p S (A^U^)", we write X =>°° p (or 
X p when the tree is finite) and say that p is derivable from X. The language L{G) 
generated by G = {N, A, R, S, A^oo) is the set of all words u G A^ such that S =^°° u. We 
say that L C is a Biichi context-free language (BCFL), if L = L{G) for some BCFG 
G. 

We also define Miiller context-free grammars (MCFG) that are CFGs (A^, A, R, S) equipped 
with a set J- C P_^_[N) of nonempty subsets of A^. We say that a derivation tree is proper 
if for each infinite path, the set of nonterminals that label an infinite number of vertices 
along the path belongs to J-. When X is the root label of a proper derivation tree having 
frontier p, then we write X =^°° p, or X p when the tree is finite. The language 
L(G) generated by such a grammar G = {N, A, R, S, consists of those words u £ A^ 
such that there is a proper derivation tree t whose root is labeled S having frontier u, 
in notation, S =>°° u. We say that a language is a Miiller context-free language 

(MCFL) if L = L(G) for some MCFG G. We say that two BCFGs or MCFGs are 
equivalent if they generate the same language. 

It is clear that every BCFL is an MCFL. It is not difficult to see that a language L C A* 
is a BCFL iff it is an MCFL iff it is an ordinary context-free language (CFL), cf. |221l23j. 
On the other hand, there exists an MCFL that is not a BCFL, for example the set of all 
well-ordered words over a one- letter alphabet, cf. |22j . 

BCFLs and MCFLs are closely related to Biichi and Miiller tree automata |331l36j. since 
a language is a BCFL (MCFL, resp.) iff it is the frontier language of a tree language 
recognized by a Biichi tree automaton (Miiller tree automaton). 

We say that a BCFG or an MCFG has no useless symbols if either it has a single 
nonterminal, the start symbol S, and no rules, or for each nonterminal X there are finite 
words p, q and a possibly infinite terminal word u with S pXq and X =^°° u. It then 
follows that there exist also terminal words v,w £ A^ with S =^°° vXw. 

It is known that for each BCFG (MCFG, resp.) there is an equivalent BCFG (MCFG, 
resp.) having no useless nonterminals. 

3 Linear context-free languages 

In this section, we define linear BCFLs and MCFLs and prove their equivalence. We will 
later use these linear languages as building blocks to construct more general BCFLs and 
MCFLs of scattered words. 

Recall that a CFG G = {N, A, R, S) is called linear if the right-hand side of each rule 
in R contains at most one occurrence of a nonterminal. A linear language in A* is the 
language generated by a linear grammar. We call a BCFG (MCFG, respectively) linear 
if its underlying context free grammar is linear. A linear BCFL (MCFL, respectively) is 
a BCFL (MCFL) that is generated by a linear BCFG (MCFG). Since every BCFL is an 
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MCFL, every linear BCFL is a linear MCFL. 

Note that when G is a linear, then every derivation tree has a single maximal path that 
contains all the nonterminal labeled vertices. We call this path the principal path of the 
derivation tree. Every vertex that does not belong to the principal path is a leaf labeled 
in A U {e}. It follows from this fact that the order type of each word of a linear BCFL 
or MCFL is either a finite ordinal n, or of the form uj + n, n + (— w) or a; + (— w). Thus, 
every word of a linear BCFL or MCFL is scattered of rank at most 1. 

Linear BCFLs and MCFLs are closely related to Biichi automata and Miiller automata, 
cf. [33]. By a Biichi-automaton we mean a system A = {Q, A,6,qo, F,Qoo), where Q is 
the finite nonempty set of states, A is the finite input alphabet, 6CQxAxQ is the 
transition relation, qq Q is the initial state, F <Z Q is the set of final states and Qoo 
is a designated subset of Q. A run of .4 on a word u € A-^ is defined as usual. A run 
on a finite word is successful if it starts in the initial state and ends in a state in F. A 
run on an uj-word is successful if it visits at least one state in Qoo infinitely often. The 
language accepted by A consists of all words u € A-^ such that A has a successful run 
on u. A Miiller automaton A = {Q,A,6,qo,F,F) is defined similarly, but instead of a 
subset of Q, the last component J-" is a designated subset of P+{Q). An infinite run is 
called successful if it starts in the initial state and the set of states visited infinitely often 
belongs to F. The language accepted by a Miiller automaton A is the set of all words in 
u € A-^ on which A has a successful run. It is well-known that a language is accepted 
by a Biichi automaton iff it is accepted by a Miiller automaton. The notion of Biichi 
automata and Miiller automata may be generalized without altering the computation 
power by allowing a finite number of transitions of the form (q, u, q') where q,q' £ Q and 
u E A*. 

Lemma 3.1 Every linear MCFL is a BCFL. 

Proof. Suppose that G = {N, A, R, S, F) is a linear MCFG. Let us construct the following 
Miiller automaton A: 

• The set of states is U {.^o}i where Zq is a new symbol. 

• The set of terminals is R. 

• The set of transitions consists of the triples (X, r, Y) such that r is a rule of the 
form X — )• uYv for some u,v € A*, together with all transitions (A, r, Zq) such 
that r is a rule of the form X ^ u with u G A*. 

• The initial state is S. 

• The set of final states is {.^o}- 

• The designated subsets of the state set are those in F. 

Clearly, this Miiller automaton accepts all words in R-^ which arise as the sequence of 
rules applied along the principal path of some derivation tree rooted S whose frontier is 
a terminal word. 
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This Miiller automaton has an equivalent Biichi automaton, say B = (Q, R, 5, qo,F, Qoo), 
where F is the set of final states and Qoo is the set of designated states. Then let us 
construct the BCFG G' = {Q, A, R' ,qQ,Qoo), where R' consists of all rules q — )• uq'v 
such that for some r R, {q,r,q') € 6 and r is a rule of the form X — )• uYv for some 
X,Y N, together with all rules of the form q ^ u such that for some r ^ R and q' S 
{q,r,q') G 5, moreover, r is of the form X u for some X. Since B is equivalent to A, 
it follows from the description of the language accepted by A that G is equivalent to G' . 

□ 

An operational characterization of linear BCFLs was given in [21\. In order to recall this 
result, we extend the uj-powei operation to sets of pairs of words. 

When ^ is a finite alphabet, we may consider ordered pairs {u, v) & x A^ that form 
a monoid with respect to the product operation 

{u,v){u' ,v') = {uu\v'v) 

with the pair (e, e) acting as identity. Then we may consider the power set of this monoid, 
P{A^ X A^), and equip this set with the operations of set union and complex product: 

U-V = {(n, v){u,v') : (n, u) G U, {v, v') £ V} = {{uu , v'v) : (n, u) G U, {v, v') G V}. 

With these operations and the constants and {(e, e)}, P{A'^ x A^) is an idempotent 
semiring [26]. We may also define a star operation by 

U* =\J [/". 

n>0 

The set P{A^) of all subsets of A^ is a commutative idempotent monoid with respect 
to the operation of set union and the constant 0. We define an action of the semiring 
P{Ai X A^) on P{A^) by 

U o L = {uwv : (n, v) G U, w G L}. 

Moreover, we define an a;-power operation P{A^ x ^tt) ^ P{A^), 

U ^ {{uqUi ...)•(... viVq) : {ui,Vi) G U}. 

Note that the w-power operation defined earlier on languages can now be ex- 

pressed as = {L X {e})^, and the (— ci;)-power operation by L^'^ = ({e} x L)^ . (The 
semiring-semimodule pair {{P{A^ X A^),P{A^)) equipped with the star and w-power 
operations is in fact an iteration semiring-semimodule pair, cf. [5l I24j.) 

Using algebraic tools, the following Kleene theorem for linear BCFLs was proved in |21j 
as a special case of a more general theorem. It is also possible to derive this result from 
the classical Kleene theorem for regular cj-languages, cf. |33j . 

Theorem 3.2 A language L C A'^ is a linear BCFL iff it is a finite union of languages 
of the sort 
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where U and V can be generated from the finite subsets of A* x A* by the operations U, 
■ and *. 

It follows from this result also that the Hausdorff rank of every word in a linear BCFL 
is at most 1. 

Corollary 3.3 A language L (1 A^ is a linear BCFL iff it is a union of an ordinary 
linear CFL Lq C A* with a finite union of languages of the sort 

where U and V can be generated from the finite subsets of A* x A* by the operations 
U, ■ and * , moreover, U is nonempty and V contains at least one pair one of whose 
components is not e. 

Corollary 3.4 A language L (1 A^ of well-ordered words is a linear BCFL iff it is a 
union of an ordinary linear CFL Lq C A* with a finite union of languages of the sort 

where Kq,Ki C A* are ordinary nonempty regular languages and Ki contains at least 
one nonempty word. 

Proof. This follows from the previous corollary by noting that when U and V are not 
empty, then U o V contains only well-ordered words iff the second component of each 
pair in U UV is e. □ 

Note that the order type of every word of a BCFL of well-ordered words is at most to. 

Corollary 3.5 A language L C is a linear BCFL iff it is a language L C A-'^ that 
can be accepted by a Biichi automaton (that is regular). 



4 Context-free languages of scattered words of bounded 
rank 

Call a language L of scattered words boundec^ if the rank of the words of L is bounded 
by an integer. It was proved in [22j that every BCFL of scattered words is bounded. In 
this section our aim is to prove that when L is a bounded language of scattered words, 
then L is an MCFL iff L is a BCFL. We will derive this result from the fact, established 
below, that every language generated by a "non-reproductive MCFG" is a BCFL. 

When G = {N, A, R, S) is a CFG, the graph Tq has N as its vertex set and edges 
X — )■ y if there is a rule of the form X —?■ pYq. We say that a nonterminal Y is 

^This notion has nothing to do with the classical notion of a bounded language L C yl*. 
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accessible from X if there is a path from X to y in r^. A subset N' Q N is strongly 
connected if for all X,Y £ N' , Y is accessible from X. A strong component is a maximal 
strongly connected subset. The height of a nonterminal X is the length n of the longest 
sequence Yq, Yi, . . . ,Yn of nonterminals belonging to different strong components such 
that Yn = X and l^i-i is accessible from Yi for each 1 < i < n. The above notions all 
extend to BCFGs and MCFGs. 

We recall from [23] that a nonterminal X of an MCFG G = {N, A, R, S, J-) is reproductive 
if there is a word p G (A'^ U A)^ containing an infinite number of occurrences of X with 
X p. We call G non-reproductive if it has no reproductive nonterminal. 

Theorem 4.1 Suppose that G = {N,A,R,S,J^) is a non-reproductive MCFG. Then 
there is a BCFG equivalent to G. 

Proof. Suppose that L = L{G) for a non-reproductive MCFG G = {N,A,R,S,F). We 
may suppose that L{G) is nonempty, since in the opposite case our claim is obvious. 
Since for each infinite path of a derivation tree, the nonterminals visited infinitely often 
form a strongly connected set, without loss of generality we may assume that each F G 
is itself strongly connected and thus included in some strong component. 

Let A be a nonterminal of height n. We show by induction on n that the language 
L{X) = {u €^ A^ : X u} generated from X is a BCFL. Suppose that we have proved 
this claim for all nonterminals of height < n. Let G denote the strong component of X. 
If y ^ u is in i? with Y & F, for some F £ F, F G, then u contains at most one 
occurrence of a nonterminal in F, and if it does, then no other nonterminal in G occurs 
in u, see also [29] . 

Indeed, suppose to the contrary that Y — )• pZq G R, where Y,Z € F for some F £ F 
with F <^ G, and suppose that pq contains an occurrence of a nonterminal in G. Since 
F is strongly connected, there is a sequence of rules 

Z PlZiqi, Zm-l PmYqni 

with F = {Y,Z,Zi,...,Zm-i}. Let p' = pi...pm, q' = qm---qi- Then Y 
{pp')'^ {q' q)~'^ ■ By assumption, p oi q contains a nonterminal in G. By symmetry, we 
may assume that p contains such a nonterminal. Since G is strongly connected, there is 
a finite derivation p rYs for some words r, s. Thus we have Y =^>°° [rY sp')^ {q' q)~^ , 
contradicting the assumption that G is non-reproductive. 

Let A^' denote the set of all nonterminals not in G accessible from X. For each F £ J^, 
F Q G and 1" S F, let us consider the MCFG Gf,y whose set of nonterminals is F and 
whose terminals are the letters in N'uA. The start symbol is Y and the only designated 
subset of -F is F itself. The rules are those rules of R whose left side is in F and whose 
right side contains a nonterminal in F. Note that Gpy is linear. Let Lp^y ^ (A^' U A)'^ 
denote the language generated by the MCFG Gp^y- 

Moreover, let Gx denote the ordinary CFG whose set of nonterminals is C, set of termi- 
nals is A^' U ^ U C", where G' = {{F,Y) : Y e F C G, F e F}, and whose start symbol 
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is X. The rules of Gx are all rules in R whose left side belongs to C, together with all 
rules Y (F, Y) with Y € F CC, F e T. Let Lx ^ (iV'U AuC")* denote the ordinary 
context-free language generated by Gx- 

Now the language L{X) C can be constructed as follows. First, for each {F, Y) as 
above, let us substitute words of L{Z) C for each occurrence of each nonterminal 
Z £ N' in the words of Lp y to obtain the language 

L'pY = Lf,y[Z ^ L{Z)] C (A U G')K 

Then, let us substitute words of L'py the occurrences of each letter {F,Y) in words 
of Lx, where Y £ F and F £ F. We have that 

L{X) = Lx[{F,Y) ^ L'p^y] C aK 

Since for every Y £ F G and F £ F, the grammar Gp^y is linear, each Lp^y is a 
BCFL by Lemma |3.1[ By the induction hypothesis, each L{Z) with Z £ N' is also a 
BCFL. Finally, Lx is a BCFL since every ordinary CFL is a BCFL. Since BCFLs are 
closed under substitution, cf. [22j, it follows now that L{X) is a BCFL. □ 

Corollary 4.2 A language of scattered words is a BCFL iff it can be generated by an 
MCFG having no reproductive nonterminals. 

Proof. The sufficiency of the claim is immediate from Theorem 14.11 In order to prove the 
necessity, recall from [22] that every BCFL of scattered words over a finite alphabet A 
can be generated by a BCFG G = {N, A, R, S, F) such that for every strong component 
G containing a nonterminal in F and every rule X —?■ p with X £ G, p contains at most 
one occurrence of a nonterminal in G. Then let G' = {N, A, R, S, F), where F is the set 
of all subsets of N containing at least one nonterminal in F. Clearly, G' is an MCFG 
equivalent to G containing no reproductive nonterminal. □ 

Corollary 4.3 An MCFL L of scattered words is a BCFL iff L is hounded. A hounded 
language of scattered words is a BCFL iff it is an MCFL. 

Proof. Suppose that L is a bounded MCFL of scattered words. It is known, cf. [23J, that 
L can be generated by a non-reproductive MCFG. Thus, L is a BCFL by Theorem 14.11 

Suppose now that L is a BCFL of scattered words. Then, as shown in [22j, L is bounded. 

□ 

The previous corollary answers a question in [23]. 

5 Operational characterization of BCFLs of scattered words 

In this section, we provide a Kleene-type characterization of the class of BCFLs consisting 
of well-ordered, or scattered words. We show how these languages may be constructed 
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using ordinary context-free languages and the w-power operation defined earlier. We will 
also define expressions denoting BCFLs of well-ordered and scattered words. 

Lemma 5.1 A language L (1 is a BCFL of scattered words iff it can he constructed 
from the singleton languages {a} for a ^ A hy substitution into ordinary CFLs and linear 
BCFLs: Suppose that B is a finite alphabet and Lq is a CFL or a linear BCFL over B, 
and suppose that for each b G B , L^ ~ then construct the language L^lb i-^ Li,] C AK 

Proof. The sufficiency follows from the fact that the languages {a}, for a € A are BCFLs 
of scattered words as is every ordinary CFL or linear BCFL, and that both the class of 
all BCFLs and the class of all languages of scattered words are closed under substitution. 

In order to prove the necessity, without loss of generality we may assume that L C ^tt 
is a nonempty BCFL of scattered words which does not contain e. Thus, by a result 
in j22], L = L{G) for a BCFG G = {N, A, R, S, F) having no useless nonterminals and 
such that L{X) = {u ^ A'^ : X =>°° u} contains at least one nonempty word for each 
X N. Since G contains no useless nonterminals, it follows that each L{X) consists of 
scattered words, and well-ordered words if every word in L is well-ordered. Indeed, if 
L(X) contains a quasi-dense word n, then since there exist words v, w with S =^°° vXw, 
L contains the quasi-dense word vuw, contrary to our assumptions. Moreover, if u is 
not well-ordered, then vuw is also not well-ordered. 

Let X ^ N he a nonterminal of height n. We show by induction on n that the language 
L{X) = {u G A"^ : X =>°° u} can be constructed as claimed. Suppose that we have 
proved this claim for all nonterminals of height < n. Let C denote the strong component 
of X. If C contains a nonterminal in F, then for any Y ^ p in R with y € C, the 
word p contains at most one occurrence of a nonterminal in C, cf. [22j. Then let N' 
denote the set of all nonterminals Y not in C accessible from X, and consider the BCFG 
G' = {G,N' U A,R',X,F'), where R' is the set of all rules in R whose left side is in G 
and F' = G CiF. This grammar is linear so that L' = L{G') is a linear BCFL. It is clear 
that L{X) can be constructed from L' by substituting the language L{Y) for all Y ^ N' 
(and the language {a} for all a & A). 

Suppose now that G contains no nonterminal in F. Then consider the ordinary CFG 
G' = (C, N'yjA, R' , X) where N' and R' are defined as above. Now L{X) is the language 
obtained by substituting L(Y) for each y G A^' in L{G'). □ 

Lemma 5.2 A language L C A^ is a BCFL L of well-ordered words iff it can be con- 
structed from the singleton languages {a} for a A by substitution into ordinary CFLs 
and linear BCFLs consisting of well-ordered words. 

Proof. The proof of the sufficiency is the same as above. In order to prove the necessity, 
we argue as before using the fact that if G contains a nonterminal in F, then for any 
Y ^ p in R with y G C, the word p contains at most one occurrence of a nonterminal 
in C, and it it occurs, it is the last letter of p, cf. |22j. □ 
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Theorem 5.3 A language L Q is a BCFL of well-ordered words iff it can be generated 
from the languages {a}, a ^ A hy substitution into ordinary CFLs and the operation of 
u -power. 

Proof. If L can be generated from the languages {a}, a € A by applying the two 
operations n times, then since the w-power operation can be described as substitution 
into the linear BCFL {b^} over the one letter alphabet {5}, it follows by Lemma [5. II that 
L is a BCFL. On the other hand, it is clear that all words in L are well-ordered. 

In order to prove the opposite direction, by Lemma 15.21 all we need to show is that 
substitution into a linear BCFL consisting of well-ordered words can be expressed by 
substitution into ordinary CFLs and the w-power operation. But this is clear from 
Corollary 13.41 (or Corollary 13. 5p . □ 

Expressions denoting ordinary CFLs, similar to the regular expressions denoting regular 
tree languages [25], were introduced in [27]. A variant of these expressions are the 
well-known ^-expressions used in several branches of computer science including process 
algebra and programming logics. By adding the operation of w-power to /i-expressions 
in an appropriate way, we now define expressions denoting BCFLs of scattered words. 

Let us fix a countably infinite set of variables. For a finite alphabet A, let jjo^oT denote 
the set of all expressions generated by the grammar 

r::=a|e|x[T + r|r-r[ ^ix.T \ Tq 

where a is any letter in A, x ranges over the variables, and the w-power operation is 
restricted to closed terms Tq. (A term is closed if each occurrence of a variable x is in 
the scope of a prefix ^x.) The semantics of expressions is defined by induction in the 
expected way. When the free variables of an expression t form the set V , then t denotes 
a language \t\ C (AuT^)^. (We assume that A is disjoint from the variables.) The prefix 
/i corresponds to taking least fixed-points: for an expression t with free variables in V 
and a variable x, ^x.t denotes the least language L <Z [A\J {V \ {x}))" such that 

\t\[x ^ L] = L. 

This language exists by the well-known Knaster-Tarski theorem, since the function map- 
ping a language L' (1 {AVJ {V \ {x}))" to \t\[x ^ V] is monotone. (Here, we understand 
that when x does not occur free in t, then \t\[x H' L'] is just We do not need a 
symbol denoting the empty language since it is denoted by the expression /ix.x, where 
X is a variable. Also, note that when |t| = L and x is a variable that does not appear in 
t, then \^x.{tx + e)| = L* , the union of all finite powers of L.) 

Let fiT denote the fragment of fiojT obtained by removing the tj-power operation. 
Clearly, every expression t € fiT denotes a language of finite words. It is known that a 
language L C ^4* is a CFL iff there is some closed t G fj,T over A with |t| = L (see |27] 
for a closely related result) . Using this fact together with Theorem 15.31 we immediately 
have: 
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Corollary 5.4 A language L is a BCFL of well-ordered words iff there is a closed 
expression t € fj,u}T over A with \t\ = L. 

We give some examples to illustrate Corollary 15.41 Suppose that the alphabet contains 
the letters a, b, c. Consider the following expressions: 

to = fix.ia'^xb'^ + e) 

ti = ifix.ia'^xb'^ + e))'^ 

t2 = /uy.(/ix.(a'^x6'^ + e)yc + e) 

They denote the languages 

Lo = {(a^)"(6'^)" : n > 0} 

Li = = {(a'^)""(6'^)"»(a'^)"H^'')"' • • • : > 0} 
L2 = U(L^c":n>0}. 

We now turn to BCFLs of scattered words. 

Theorem 5.5 Suppose that L AK Then A is a BCFL of scattered words iff L can be 
generated from the languages {a}, for a £ A by the following operations: 

1. Substitution into ordinary context-free languages. 

2. The operation L x L' = {{u,v) : u £ L,v £ L'}, where L,L' C AK 

3. The operations UUV, U -V and U* , where U,V C A^ x AK 
4- The operation , where [/ C x AK 

Proof. For U <Z A^ x yl", let us denote by U the language {u#v : {u,v) € f/} C 
{A U {#})", where # is a new symbol. In order to prove that every language generated 
from the languages {a}, a G ^4 by the above operations is a BCFL of scattered words, 
by Lemma |5. II it suffices to show the following claims. 

Claim 1 li L and L' are BCFLs of scattered words and U = L x L', then [/ is a BCFL 
of scattered words. 

Claim 2UU,V <Z A^ X A^ are such that U and V are BCFLs of scattered words, and if 
W = UUV,W = UV oi W = U*, then W is also a BCFL of scattered words. 

Claim 3UU with [/ C At* x ^t) is a BCFL of scattered words, then C/^ is a BCFL of 
scattered words. 

Regarding the first claim, note that if G = {N, A, R, S, F) and G' = {N', A, R', 5", F') are 
BCFLs generating L and L', respectively, where and A^' are disjoint, and \iU = Lx L' , 
then the BCFG G" = {N \J N' U {Sq], A U {#}, RU R' U {Sq ^ S#S'},F U F'), where 
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Sq is a new symbol, generates U. Moreover, each word in U is scattered, since each is a 
concatenation of three scattered words. 

As for the second claim, let U,V C x such that tl and V are BCFLs of scattered 
words. IfW = UUV then W = UUV and it is clear that W is a BCFL of scattered 
words, since BCFLs of scattered words are closed under union. Now let W = UV. 
Since W can be obtained from U by substituting V for and since both BCFLs 
and the class of languages containing scattered words are closed under substitution, it 
follows that is a BCFL of scattered words. Finally, suppose that W = U* and let 
G = {N,AU{#}, R, S, F) be a BCFG for U. We may assume that the right side of each 
rule contains at most one occurrence of 7^. Then let Sq be a new nonterminal and define 
G' = {N,AU {#},R',So,F), where R' contains the rule #, the rules X ^ p in 

R such that # does not appear in p, and all rules X pSoq such that X p^q is in 
R. Then L{G') = W. Since every word in 14/^ is a concatenation of a finite number of 
scattered words, W contains only scattered words. 

In order to prove the last claim, suppose that U C A^xA^ and U is a BCFL of scattered 
words. It is clear that each word in is scattered, since the underlying linear order 
of each word in is a scattered sum of scattered linear orderings. To show is a 
BCFL, let G = {N,AU {#}, R, S, F) be a BCFG generating U. Let Sq be a new symbol 
and consider the BCFG G' = {N U {Sq}, A U {#}, R', Sq, F U {Sq}), where R' consists 
of the rule Sq — )• S, all rules X ^ p in R such that p does not contain #, and all rules 
X pSoq such that X p#q is in R. (Without loss of generality we may assume that 
the right side of each rule in R contains at most one occurrence of Then L{G') = . 

We have thus proved that every language that can be constructed from the primitive 
languages {a}, a G ^ by the operations given in the Theorem is a BCFL of scattered 
words. The fact that every BCFL of scattered words over A can be constructed follows 
from Lemma |5. II and Theorem 13.21 □ 

We may now introduce expressions denoting BCFLs of scattered words. In our definition, 
we also use expressions denoting sets of pairs of words. The expressions in iiojT' over 
the alphabet A are defined by the following grammar: 

T ::= a|e|x[r + T|r-r[^x.T|P'^ 
P ::= To X ro|P + P|P-P|P* 

Here, Tq stands for an expression of syntactic category T without free variables. Ex- 
pressions corresponding to the syntactic category P denote sets of pairs of words. The 
semantics of the expressions should be clear. When t € fiuT', we write \t\ for the 
language denoted by t. 

Corollary 5.6 A language L (1 A^ is a BCFL of scattered words iff there is a closed 
expression t € fiuiT' (of syntactic category T) over A with \t\ = L. 

We again give some examples. But first, let us introduce some abbreviations. When t is 
an expression of syntactic category T, then let us define and t~^ as the expressions 
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{t X e)"^ and (e x t')'^ . Now let 

to = (a'^ X b-'^)'^ 

= {{a^hy{hxa)Y 

t2 = fix.^a'^xb^'^ + e). 

The languages denoted by these expressions are: 

Lo = {(a-)-(6-)-} 

Li = {(a"°6a"i6...) • (•••a&"'a6"°) : ni >0} 

L2 = {(a'^)"(6-^)" : n > 0}. 



6 Applications 

Suppose that L C is a language of scattered words. Then let rmax(L) = sup{r(n) : 
ti E L}, so that rmax(L) is an ordinal at most cji, the first uncountable ordinal. When 
L = 0, we define rmax(L) = —00. (We understand that —00 < a and —00 + a = —00 for 
all ordinals a.) In |22) . it was shown that rmax(L) is finite for every nonempty BCFL 
of scattered words. We give a new proof of this result. 

Theorem 6.1 Suppose that L Q is a scattered BCFL. Then rmax(L) is either finite 
or —00. 

Proof. We use Lemma l5.ll When L = {a}, for some letter a & A, then rmax(L) = 0. 
Suppose now that L = K[b 1— )• Lf,], where is a CFL or a linear BCFL over some 
alphabet B and each C A" is a scattered BCFL. Suppose that we have already shown 
that rmax(Lfe) is finite for each b B. If = then L = $ and rmax(L) = —00. If 
K = {e} then L = {e} and rmax(L) = 0. If K ^ {e} then let Bq denote the set of all 
letters of B that occur in some word u of K such that ^ % for all b occurring in u. 
We have rmax(L) < 1 + max{rmax(L;,) : b € Bq}. □ 

For a scattered language L C ^tt, let us define rrange(L) = {r(n) : u G L}. 

Theorem 6.2 Suppose that L (1 is a scattered BCFL. Then rrange(L) is a finite set 
of integers that can be computed from a BCFC generating L. 

Proof. We already know that rrange(L) is a finite subset of the integers. We use 
Lemma 15.11 to show it is computable. When L = {a}, for some letter a G A, then 
rrange(-L) = {0}. Suppose now that L = K[b 1-^ L^], where if is a CFL or a linear 
BCFL over an alphabet B, and each Lf^ is a BCFL. Suppose that we have already com- 
puted the sets rrange(Lft), b € B. Since both CFLs and linear BCFLs are effectively 
closed under substitution of finite languages of finite words, we may assume that none 
of the Lb is empty, and none of them contains e. 
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Given a grammar for K and grammars for the Lf,, we may compute the set T C P{B) x 
P{B) consisting of all pairs {Hq,Hi) such that for some word u G K, Hq is the set of 
all letters that occur a finite number of times in u, and Hi is the set of all letters that 
occur an infinite number of times in u. 

Suppose that {Hq,Hi) € F, say Hq = {bi, . . . ,bk}, Hi = {ci,...,q}. Then for all 
families rii G rrange(Lb.) and mj S rrange(Lc^.), i = l,...,/c, j = take the 

integer n = max{nj, mj + 1 : i = 1, . . . ,k, j = 1, . . . ,£}. Then rrange(L) is the set of all 
these integers n, for all possible choices of {Hq,Hi) in F, and for all possible choices of 
the TLi and rrij. 

The fact that rrange(L) can be computed from a BCFG for L now follows by the con- 
structive proof of Lemma 15.11 □ 

Suppose that L C ^4^ is a scattered language. Then let rmin(L) = min{r(ti) : u G L}, 
so that rmin(L) is a countable ordinal. When L = 0, we define rmin(L) = oo. As a 
corollary to the previous result, it is clear that for a BCFL L generated by a BCFG G, 
rmin(L) is either finite or oo, and that rmin(L) can be effectively computed along the 
lines of the previous proof. However, for each {Hq, Hi) in F, it suffices to compute now 
only one integer, max{nj, mj + 1 : i = 1, . . . ,k, j = !,...,£}, where using the above 
notation, rii = rmm{Lf,.) and nij = rmin(Lcj. ) for all i and j. Then rmin(L) is the 
minimum of all these integers. 
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